Archiving data in cloud/object storage using local metadata staging

ABSTRACT

Techniques for archiving data in cloud/object storage using local metadata staging are provided. In one set of embodiments, a computer system residing at an on-premises site comprising on-premises storage can receive a snapshot of a dataset to be archived. The computer system can package data in the snapshot into one or more fixed-size data chunks and upload the one or more fixed-size data chunks to cloud/object storage. Further, concurrently with the packaging and the uploading, the computer system can stage metadata for the snapshot in the on-premises storage. Then, upon uploading all of the data of the snapshot, the computer system can upload the metadata staged in the on-premises storage to the cloud/object storage.

CROSS-REFERENCES TO RELATED APPLICATIONS

The present application is related to commonly-owned U.S. patentapplication Ser. No. 16/035,455, entitled “Managing Overwrites WhenArchiving Data in Cloud/Object Storage,” which is filed concurrentlyherewith. The entire contents of this application are incorporatedherein by reference for all purposes.

BACKGROUND

In computing, “object storage” is a data storage model that manages datain the form of containers referred to as objects, rather than in theform of files (as in file storage) or in the form of blocks (as in blockstorage). “Cloud/object storage” is an implementation of object storagethat maintains these objects on servers that are accessible via theInternet. Examples of commercially-available cloud/object storageservices include Amazon's Simple Storage Service (S3) and Google CloudStorage.

Cloud/object storage generally offers high scalability, high durability,and relatively low cost per unit of storage capacity, which makes it anattractive solution for organizations seeking to archive large volumesof data for long-term backup and recovery purposes. However, there are anumber of complexities that make it difficult to use existingcloud/object storage services as a backup target. For example, manyexisting cloud/object storage services can only guarantee eventualconsistency to clients, which means that if an update is made to anobject, all subsequent client accesses to that object will eventually,but not necessarily immediately, return the object's updated value. Somecloud/object storage services mitigate this by guaranteeingread-after-write consistency for newly created objects. But, without astronger consistency model that also guarantees read-after-writeconsistency for modified objects, it is difficult to build a databackup/restore system that ensures clients have a consistent view of thearchived data.

Further, the network bandwidth between an organization's on-premises(i.e., local) site and cloud/object storage is usually limited due tothe need to traverse the Internet. Similarly, the latency fromon-premises equipment to cloud/object storage is relatively high, andnetwork timeouts or other network issues can be prevalent. These factorsincrease the costs of writing a large number of objects per backup taskand can cause write throttling to occur.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a system environment according to an embodiment.

FIG. 2A depicts an initial snapshot upload workflow according to anembodiment.

FIG. 2B depicts an example structure of a cloud archive after theinitial snapshot workflow of FIG. 2A according to an embodiment.

FIG. 3 depicts a workflow for staging snapshot metadata using anarbitrary mapping approach according to an embodiment.

FIG. 4 depicts a delta snapshot upload workflow according to anembodiment.

FIGS. 5A and 5B depict workflows for managing overwrites to thesuperblock chunk of a cloud archive according to an embodiment.

FIG. 5C depicts an example structure of a cloud archive after thecreation of one or more .ARCHIVE files for the superblock chunkaccording to an embodiment.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerousexamples and details are set forth in order to provide an understandingof various embodiments. It will be evident, however, to one skilled inthe art that certain embodiments can be practiced without some of thesedetails, or can be practiced with modifications or equivalents thereof.

1. Overview

Embodiments of the present disclosure describe techniques that can beperformed by a client system running at an organization's on-premisessite for backing up (i.e., archiving) data from the on-premises site tocloud/object storage using a mechanism referred to as local metadatastaging. According to one set of embodiments, the client system can (1)receive an initial snapshot of a source dataset (e.g., file) to bearchived, (2) package the data blocks of the snapshot into fixed-sizedchunks, and (3) upload each chunk, as it is filled with snapshot data,to the cloud/object storage. The uploaded chunks can be appended/addedto a data structure maintained on the cloud/object storage for thesource dataset, referred to as the dataset's cloud archive.

Simultaneously with (2) and (3), the client system can locally stage(e.g., create and update) metadata describing the structure of thesnapshot (as it is stored in cloud/object storage) in on-premisesstorage. This metadata, which is staged in a data structure on theon-premises storage referred to as the dataset's resident archive, cantake the form of a B+ tree. The leaf nodes of the B+ tree can identifycloud physical block addresses (CBPAs) of the cloud archive where thedata blocks of the snapshot are uploaded.

Finally, once all of the snapshot data has been uploaded and thelocally-staged snapshot metadata has been fully updated, the clientsystem can upload the snapshot metadata (as well as archive metadata) inthe form of chunks to the cloud archive residing in cloud/objectstorage, thereby completing the archival/upload workflow for thesnapshot. The client system can subsequently repeat this workflow forfurther snapshots of the dataset by calculating a delta between a givensnapshot and the previous snapshot and uploading the data and modifiedmetadata for the delta.

The foregoing and other aspects of the present disclosure are describedin further detail in the sections that follow.

2. System Environment

FIG. 1 is a simplified block diagram of a system environment 100 inwhich embodiments of the present disclosure may be implemented. Asshown, system environment 100 includes an on-premises client system 102at a customer (i.e., on-premises) site 104 that is connected via theInternet 106 to a cloud/object storage service/system 108. Client system102 may be, e.g., a physical computer system or a virtual machine (VM).Cloud/object storage 108 may be any such storage service/system known inthe art, such as Amazon's S3.

Although an exhaustive discussion of cloud/object storage 108 is beyondthe scope of this disclosure, the following are a few salientcharacteristics that may be exhibited by cloud/object storage 108 incertain embodiments:

-   -   Each object in cloud/object storage 108 (also referred to herein        as a “chunk”) can be maintained in a flat address space and can        include the data for the object itself (i.e., the object's data        payload), a variable amount of object metadata, and a globally        unique identifier (i.e., key).    -   Cloud/object storage 108 can expose a relatively simple data        access API (application programming interface) to client system        102 that includes (1) a GET(k) function for retrieving an object        identified by specified key k; (2) a PUT(o, k) function for        creating or updating specified object o identified by specified        key k; and (3) a DELETE(k) function for deleting an object        identified by specified key k.

Typically, cloud/object storage 108 will be owned and maintained by astorage service provider, such as Amazon, that is distinct from theentity that owns customer site 104. However, in some embodiments,cloud/object storage 108 can be part of a private cloud that isowned/maintained by the same entity as customer site 104.

In addition to being connected to cloud/object storage 108, clientsystem 102 is also connected to an on-premises storage system 110 thatincludes a dataset 112. Dataset 112 may be, e.g., virtual disk data forone or more VMs, a document repository, or any other type of datasetthat is modified on an ongoing basis at customer site 104. In thisenvironment, the goal of client system 102 is to periodically archivedataset 112 from on-premises storage 110 to cloud/object storage 108 fordata protection, such that the most recently backed-up copy of dataset112 can be restored from cloud/object storage 108 if a disaster orfailure occurs that causes the on-premises copy of the dataset to belost. However, as mentioned previously, there are a number of challengesthat make it difficult to accomplish this in an efficient and performantmanner (e.g., weak consistency guarantees offered by cloud/objectstorage 108, low bandwidth and high latency between customer site 104and cloud/object storage 108, etc.).

To address the foregoing and other related issues, client system 102 ofFIG. 1 is enhanced to include a novel archive management agent 114. Invarious embodiments, archive management agent 114 may be implemented insoftware, in hardware, or a combination thereof. In a particularembodiment, archive management agent 114 may be implemented as auser-mode application and thus can make use of certain network securityprotocol libraries for communicating with cloud/object storage 108, suchas Transport Layer Security (TLS), that are only available in userspace.

As detailed in the sections that follow, archive management agent 114can employ techniques for archiving point-in-time copies (i.e.,snapshots) of dataset 112 to cloud/object storage 108 in a manner thatstreams the new/modified data for each snapshot (in the form offixed-size chunks) to a “cloud archive” 116 in cloud/object storage 108,but stages metadata for the snapshot locally on client system 102 in a“resident archive” 118 while the snapshot data is being uploaded. Thismetadata can comprise a B+ tree structure whose leaf nodes point tocloud physical block addresses (CPBAs) in cloud archive 116 where eachdata block of the snapshot is uploaded, and whose intermediate nodesguide traversal down the tree (based on logical block addresses ofdataset 112).

Then, when all of the new/modified snapshot data has been uploaded andthe locally-staged snapshot metadata has been fully updated, archivemanagement agent 114 can upload the snapshot metadata in the form ofchunks to cloud archive 116. Archive management agent 114 can alsoupload archive metadata comprising information regarding the snapshot(e.g., an association between the snapshot ID and a pointer to the rootnode of the snapshot's B+ tree, the snapshot's range of data chunks, thesnapshot's range of metadata chunks, checksums, etc.). Once thismetadata upload is done, the archival/upload workflow for the snapshotis complete. Archive management agent 114 can subsequently repeat thisworkflow for delta changes to dataset 112 captured in further snapshots,thereby archiving those further snapshots in cloud archive 116.

With the high-level approach described above, a number of advantages canbe realized. First, because the metadata for the snapshot upload isstaged locally and updated/finalized in on-premises storage 110 beforebeing sent to cloud/object storage 108, there is no need to overwritesnapshot metadata in the cloud; this metadata is uploaded exactly oncefor each snapshot, at the end of the archival/upload workflow (note thatthere will typically be a large amount of metadata “churn” during thisworkflow as snapshot data chunks are processed and uploaded due to thecreation and splitting of B+ tree nodes). Similarly, snapshot data isalways appended to (rather than overwritten in) cloud archive 116. Theseaspects avoid the problems raised by the eventual consistency modelemployed by existing cloud/object storage systems.

Second, by batching and uploading snapshot data and metadata infixed-sized chunks (i.e., objects) rather than on a per-block basis,archive management agent 114 can more efficiently use the availablebandwidth between customer site 104 and cloud/object storage 108.

Third, in certain embodiments the locally-staged metadata in residentarchive 118 can be leveraged by client system 102 to accelerate variousarchive operations, such as delete and restore.

It should be noted that two different approaches as possible forallocating local and cloud PBAs to snapshot metadata as the metadata isstaged during the archival/upload workflow. According to a firstapproach (referred to herein as the “one-to-one mapping” approach), aparticular predefined range of LPBAs may be reserved for snapshotmetadata in resident archive 118 of on-premises storage 110 and anidentical predefined range of cloud physical block addresses (CPBAs) maybe reserved for snapshot metadata in cloud archive 116 of cloud/objectstorage 108. For example, a range of zero to 2 terabytes may be reservedin the LPBA space of resident archive 118 and the CPBA space of cloudarchive 116 respectively. Note that the CPBA of a given block in cloudarchive 116 is determined by its chunk ID, the chunk size, and offsetwithin that chunk; for instance, if agent 114 uploads metadata to cloudarchive 116 in 1 MB chunks, the CPBA of a metadata block stored at chunk4, offset 4K will be (4×1 MB+4K)=4100K.

Then, at the time of creating/staging metadata locally in residentarchive 118 during a snapshot upload, archive management agent 114 canallocate data blocks sequentially from the reserved LPBA range inresident archive 118 for holding the metadata, and at the time ofuploading the locally staged metadata, archive management agent 114 canpack those metadata blocks according to the same sequence into chunkshaving sequential chunk IDs within the reserved CPBA range and uploadthe chunks to cloud archive 116. This effectively results in aone-to-one mapping between the LBPAs of the metadata blocks in residentarchive 118 and the CPBAs of those metadata blocks in cloud archive 116,which avoids the need to perform any address translations at the timethe metadata blocks are uploaded to cloud archive 116. This approach isexplained in further detail in Section 3 below.

According to a second approach (referred to herein as the “arbitrarymapping” approach), there is no correspondence between the LPBAs used tostore metadata blocks on-premises and CPBAs used to store those samemetadata blocks in cloud/object storage; rather, agent 114 uses anyavailable blocks in the LPBA range of resident archive 118 to holdmetadata during the local staging. As a result, once all of the metadatablocks for a given snapshot have been full updated in on-premisesstorage and are ready to be uploaded to cloud/object storage, agent 114needs to identify the pointers in the B+ tree structure created for thesnapshot (i.e., the pointers pointing to nodes within the B+ tree) andupdate those pointers to properly point to the CPBAs where those nodeswill reside in the CPBA range of cloud archive 116. This approach isexplained in further detail in Section 4 below.

It should be appreciated that system environment 100 of FIG. 1 isillustrative and not intended to limit embodiments of the presentdisclosure. For example, although only a single on-premises clientsystem 102 is shown, any number of client systems may be configured tointeract with cloud/object storage 108 for the purpose of backing up orrestoring data set 112, potentially on a concurrent basis. Further, thevarious entities depicted in FIG. 1 may be organized according toalternative configurations or arrangements and/or may include componentsor functions that are not specifically described. One of ordinary skillin the art will recognize other variations, modifications, andalternatives.

3. Initial Snapshot Upload Workflow

FIG. 2A depicts a workflow 200 that may be executed by archivemanagement agent 114 for uploading/archiving an initial (i.e., first)snapshot of dataset 112 to cloud/object storage 108 using local metadatastaging according to an embodiment. This workflow assumes that themetadata for the snapshot will be mapped in a one-to-one manner from theLPBA of resident archive 118 to the CPBA of cloud archive 116.

Starting with step 202, an initial snapshot (e.g., snapshot S0) ofdataset 112 can be taken on on-premises storage 110 and made availableto archive management agent 114. Since this is the first snapshot ofdataset 112, the snapshot will contain the entirety of the data ofdataset 112.

At step 204, archive management agent 114 can allocate space onon-premises storage 110 for the resident archive of dataset 112 (i.e.,resident archive 118), which will be used to locally stage metadata forthe snapshots of dataset 112 that will be uploaded to cloud/objectstorage 108. The physical block address range that is allocated toresident archive 118 here is referred to as the local physical blockaddress (LPBA) range of archive 118. As part of this step, archivemanagement agent 114 can reserve a portion of the LPBA range for a“superblock,” which is a segment of resident archive 118 that storesmetadata about the archive itself (e.g., snapshots in the archive,checksums, etc.). This superblock will typically be allocated one chunk,where “chunks” are the units of data that are uploaded by agent 114 tocloud/object storage 108. In various embodiments, one chunk may have afixed-size, such as 1 MB, 2 MB, 4 MB, etc. Archive management agent 114can also reserve a portion of the LPBA range of resident archive 118 forstoring snapshot metadata (e.g., a range of zero of 2 TB within the LPBArange).

Once archive management agent 114 has allocated space for residentarchive 118 in on-premises storage 110, agent 114 can also initialize a“bucket” in cloud/object storage 108 corresponding to the cloud archivefor dataset 112 (i.e., cloud archive 116) (step 206). This bucket isessentially a named container that is configured to hold cloud objects(i.e., chunks) representing the snapshot data/metadata for dataset 112that is uploaded by agent 114. The cloud physical address space (CPBA)of cloud archive 116 starts at zero and is extended each time a chunk iswritten to archive 116. Thus, since an unlimited number of chunks maygenerally be uploaded to cloud/object storage 108, the CPBA of cloudarchive 116 can potentially extend to infinity. The CPBA of a givenblock of data/metadata within cloud archive 116 can be calculated aschunk ID (i.e., ID of chunk in which the block resides)×chunksize+offset (i.e., offset of block within chunk).

In various embodiments, as part of step 206, archive management agent114 can create a superblock chunk in cloud archive 116 that correspondsto the superblock allocated in resident archive 118 at step 204. Inaddition, archive management agent 114 can reserve a range of CPBAs(i.e., range of chunk IDs) in cloud archive 116 for snapshot metadatathat is identical to the reserved metadata LPBA range in residentarchive 118.

At step 208, archive management agent 114 can initialize a “data chunkID” variable to some starting value X that corresponds to the chunkID/location in the CPBA of cloud archive 116 where data chunks shouldbegin being written to (this may be, e.g., the first chunk ID after thereserved metadata range). Archive management agent 114 can then beginreading the data in the initial snapshot of dataset 112, on ablock-by-block basis in increasing logical block address order (step210).

At steps 212 and 214, for each data block read from the initialsnapshot, archive management agent 114 can place the data block into amemory buffer of fixed size that corresponds to the fixed-size chunksthat will be uploaded to cloud/object storage 108. For example, if agent114 is configured to upload 4 MB chunks to cloud/object storage 108, thememory buffer will be 4 MB in size. Archive management agent 114 canassign a chunk ID to this memory buffer corresponding to the currentvalue of the data chunk ID variable (step 216).

Further, at step 218, archive management agent 114 can build/updatemetadata (i.e., a B+ tree) for the initial snapshot based on the readdata block and locally write this metadata to sequential blocks withinthe reserved metadata LPBA range of resident archive 118. The internalnodes of the B+ tree are nodes that guide tree traversal down to theleaf nodes. The leaf nodes, in turn, are configured to point to theCPBAs (i.e. chunk IDs and offsets) in cloud archive 116 where the datablocks of the snapshot will be archived. The keys of the internal nodesreflect the logical block address space of the snapshot file.

For instance, assume a new data block of the initial snapshot is placedinto the memory buffer at step 214 (for upload to cloud/object storage108). In this case, a new leaf node of the snapshot's B+ tree can becreated at step 218 that includes a pointer to the CPBA of the datablock (i.e., chunk ID of memory buffer×chunk size+offset) and this leafnode will be written to the next free block within the reserved metadataLPBA range of resident archive 118. Further, if the creation of the leafnode necessitates the creation of one or more parent (i.e.,intermediate) nodes in the B+ tree per standard B+ tree node splitcriteria, such parent nodes will also be created and writtensequentially into blocks in the reserved LPBA range of resident archive118.

At step 220, archive management agent 114 can check whether the memorybuffer used to hold data blocks from the snapshot has become full; ifnot, agent 114 can return to the start of the loop (step 212) to processthe next data block. On the other hand, if the memory buffer has becomefull at step 220, archive management agent 114 can package the contentsof the memory buffer into a data chunk, upload the data chunk (with itsassigned chunk ID) to cloud archive 116 of cloud/object storage 108, andincrement the data chunk ID variable (step 222) before reaching the endof the current loop iteration (step 224) returning to the start of theloop. Although not explicitly shown, if the current data block is thelast data block in the snapshot, archive management agent 114 canpackage and upload the contents of the memory buffer to cloud/objectstorage 108 even if it has not reached capacity.

Once all of the data blocks from the initial snapshot have been read andprocessed, archive management agent 114 can sequentially read themetadata blocks that have been written to the reserved metadata LPBArange of resident archive 118 (step 226), package the metadata blocksinto fixed-size chunks in a manner similar to the data blocks (step228), and then sequentially upload these metadata chunks to the reservedCPBA range of cloud archive 116 (step 230). These metadata chunks areassigned chunk IDs that result in the LPBAs of the metadata blocks inresident archive 118 matching one-to-one with the CPBAs of the metadatablocks as they are stored in cloud archive 116. Among other things, thisone-to-one mapping ensures that the internal pointers in the B+ treerepresented by the metadata (i.e., pointers pointing to internal nodesin the tree) are still valid once uploaded to cloud/object storage 108,and thus the tree can be properly traversed using the cloud-archivedmetadata.

Finally, at step 232, archive management agent 114 can upload archivemetadata to the superblock chunk in cloud archive 116 that includes,e.g., an association between the ID of the current snapshot (e.g., S0)and the PBA of the root node of the B+ tree for the snapshot (therebyallowing the metadata for the snapshot to be found and traversed), aswell as potentially other archive metadata (e.g., range of metadatachunks for snapshot, range of data chunks for snapshot, checksums,etc.). Once this is completed, the archival/upload process for thesnapshot is done and the workflow can end.

FIG. 2B is a diagram 250 that illustrates the contents of cloud archive116 at the conclusion of upload workflow 200 according to an embodiment.As shown in diagram 250, cloud archive 116 includes a superblock chunk252 (associated with chunk ID 0), a number of metadata chunks 254(1)-(M)for the uploaded snapshot (associated with chunk IDs 1 to M withinreserved metadata range 256), and a number of data chunks 258(1)-(N)(associated with chunk IDs X to X+N, where X is the first chunk ID afterthe end of reserved metadata range 256). In this example, the CPBA ofcloud archive extends from zero to (X+N)×S, where S is the fixed size ofeach metadata/data chunk. This CPBA will be extended further as newchunks are uploaded to cloud archive 116 for subsequent snapshots ofdataset 112.

4. Alternative Metadata Mapping (Arbitrary)

As mentioned previously, as an alternative to performing one-to-onemapping of metadata between the LPBA of resident archive 118 and theCPBA of cloud archive 116, archive management agent 114 can insteadarbitrarily allocate blocks for metadata from the LPBA during localmetadata staging. With this alternative approach, there is no reservedaddress range for metadata in the LPBA or CPBA; instead, as agent 114 isbuilding the B+ tree for the snapshot, the agent can allocate blocksfrom anywhere in the LPBA and use those allocated blocks to hold the B+tree data (i.e., node information). Then, when all data chunks have beensent to the cloud, archive management agent 114 can perform a processfor uploading the metadata to cloud/object storage 108 that includestranslating metadata pointers that point to LPBAs (i.e., pointers tointernal tree nodes) to instead point to appropriate CPBAs where themetadata will be uploaded. FIG. 3 depicts a workflow 300 of thismetadata upload process according to an embodiment.

Starting with step 302, archive management agent 114 can walk throughthe B+ tree created/built during the data upload phase of archivalworkflow 200, from the lowest to highest level in tree.

For each encountered tree node (step 304), archive management agent 114can place the node into a fixed-size memory buffer corresponding to thesize of a single chunk (step 306) and can assign a chunk ID to thisbuffer (step 308). Agent 114 can start this chunk ID at the last valueof the data chunk ID variable described earlier, such that metadatachunks are written to the CPBA immediately following the data chunks forthe snapshot.

At step 310, archive management agent 114 can record the current chunkID and offset for the node within the chunk in a temporary mappingtable. This mapping table can associate the cloud chunk ID/offset forthe node with the node's LPBA in resident archive 118.

Then, if the node includes a pointer to a LPBA for a child node in theB+ tree (step 312), archive management agent 114 can determine the cloudchunk ID/offset for that child node from the temporary mapping tablebased on its LBPA (step 314) and can replace the LPBA with the chunkID/offset in the node, thereby translating the LPBA to a CPBA (i.e.,chunk ID/offset) (step 316).

Finally, if the memory buffer is now full (step 318), archive managementagent 114 can upload the contents of the memory buffer as a chunk (withits assigned chunk ID) to cloud archive 116 in cloud/object storage 108,thereby archiving it there (step 320). The current loop iteration canthen end (step 322) and archive management agent 114 can return to thetop of the loop (step 302) and repeat this process until all tree nodeshave been processed.

With workflow 300, the structure of cloud archive 116 shown in FIG. 2Bwill be slightly different since there is no reserved metadata range256; instead, the metadata chunks for the uploaded snapshot (254(1)-(M))will appear in the CPBA after data chunks 258(1)-(N).

5. Delta Snapshot Upload Workflow

FIG. 4 depicts a workflow 400 that may be executed by archive managementagent 114 for uploading/archiving a delta (e.g., second or later)snapshot of dataset 112 to cloud/object storage 108 using local metadatastaging according to an embodiment. This workflow assumes that at leastone snapshot of dataset 112 has already been uploaded per workflow 200of FIG. 2A and now a second snapshot needs to be uploaded that captureschanges to dataset 112 since the first snapshot.

The steps of workflow 400 are largely similar to workflow 200; however,rather than starting with an initial snapshot of dataset 112, a newsnapshot of the dataset is taken at block 402 and a delta between thenew snapshot and the immediately previous snapshot (i.e., the datablocks that have changed between the two snapshots) is determined atblock 404. This delta is then read by archive management agent 114 andprocessed at subsequent blocks 406-428 in a manner that is analogous toblocks 210-232 of workflow 200.

It should be noted that, as part of building the B+ tree for the deltasnapshot data, archive management agent 114 can reuse the nodes of B+trees of previous snapshot (in other words, point to existing tree nodesof previous snapshot(s) for portions of the tree that have not changed).For portions of the B+ tree that do need to be modified for the deltasnapshot data, archive management agent 114 can employ copy-on-write tocreate new copies of those specific nodes.

In addition, it should be noted that at step 428 archive managementagent 114 overwrites the existing superblock chunk in cloud archive 116in order to update it with the metadata for the current snapshot (e.g.,snapshot ID and pointer to the root node of the snapshot's B+ tree). Asmentioned previously, performing such overwrites in cloud/object storage108 can raise consistency issues since most cloud/object storage systemsonly guarantee eventual consistency. One mechanism for managing thisissue is addressed in the next section below.

6. Managing Overwrites to Superblock Chunk

Per block 428 of workflow 400, archive management agent 114 overwritesthe superblock chunk in cloud archive 116 at the conclusion of thesnapshot archival/upload process in order to update the superblock withmetadata regarding the uploaded snapshot (e.g., snapshot ID and pointerto snapshot's B+ tree root node). Since overwrites are only eventuallyconsistent in most cloud/object storage systems, this can cause numerousproblems when the superblock needs to be accessed again for variousarchive operations. For example, consider a scenario where a clientwishes to restore the most recently archived snapshot of dataset 112(e.g., snapshot S100). In this case, the client will read the superblockchunk of cloud archive 116, which was updated with information regardingS100 during the last upload workflow. However, assuming cloud/objectstorage 108 is only eventually consistent, the read (i.e., GET)operation requested by the client may return an older version of thesuperblock that identifies a snapshot that is older than the most recentsnapshot (e.g., snapshot S90). Thus, the client may begin restoring fromolder snapshot S90 under the erroneous belief that it is restoring thelatest version of the data.

To address this, FIG. 5A depicts a workflow 500 that can be performed byarchive management agent 114 at the time of overwriting the superblockchunk in cloud archive 116 and FIG. 5B depicts a complementary workflow500 that can be performed by a client at the time of accessing thesuperblock in order to identify the most recently uploaded snapshot.Taken together, these two workflows can ensure that the client canalways correctly determine the most recent snapshot in cloud archive116, despite the eventual consistency property of cloud/object storage108 (this solution assumes that cloud/object storage 108 supportsread-after-write consistency for newly created objects).

Starting with step 502 of workflow 500, archive management agent 114 canoverwrite (i.e., update) the superblock chunk of cloud archive 116 witharchive metadata for the most recently uploaded snapshot. This archivemetadata can include an identifier of the snapshot and a pointer (e.g.,chunk ID and offset) to the root node of the snapshot's B+ tree. Thisstep is substantially similar to step 428 of workflow 400.

However, rather than simply overwriting the superblock chunk, archivemanagement agent 114 can also create a new instance of a special file incloud archive 116 (referred to as a “.ARCHIVE” file) that has a versionnumber corresponding to the snapshot ID number (step 504). For example,if the most recently uploaded snapshot is SX, the .ARCHIVE file createdat block 504 will have a version number X (e.g., .ARCHIVE.X). This newlycreated file version will be readable by all clients immediately afterits creation under the property of read-after-write consistency. This isillustrated in diagram 570 of FIG. 5C, which shows cloud archive 116with .ARCHIVE files ARCHIVE.0 to ARCHIVE.X (one file for each uploadedsnapshot S0 to SX). In various embodiments, these .ARCHIVE files do notcontain any data content of substance; instead, the reason for creatingthese files is to simply track the ID of the most recentlyuploaded/archived snapshot by virtue of the .ARCHIVE file versionnumbers.

Turning now to workflow 550, at the time a client wishes to determinethe most recently archived snapshot for dataset 112, the client canfirst read the superblock chunk in cloud archive 116 and determine thelatest snapshot ID recorded there (step 552). For example, the clientmay determine that the latest snapshot ID in the superblock is SY, whereY is some number. The client can then check whether a .ARCHIVE fileexists in cloud archive file 116 with a version number corresponding toY+1 (step 554). If not, the client can conclude that Y is the latestsnapshot archived for dataset 112 (step 556).

However, if the client determines at step 554 that a .ARCHIVE file doesexist with a version number corresponding Y+1, the client can set Y=Y+1(step 558) and then return to step 554 to continue checking whether a.ARCHIVE file exists with a further incremented version number. Thisprocess can repeat for increasing values of Y until the latest versionof the .ARCHIVE file is found at step 556, which identifies the mostrecently archived snapshot of dataset 112.

Finally, once the latest .ARCHIVE file (and thus latest snapshot) isfound, the client can take an appropriate action based on thisinformation (step 560). For example, if the client is attempting torestore the latest snapshot and determines that the latest snapshotdiffers from what is found in the superblock at step 552, the client maywait until the superblock properly reflects the archive metadata for thelatest snapshot. Alternatively, the client may simply decide to beginrestoring from the older snapshot found in the superblock.

Certain embodiments described herein can employ variouscomputer-implemented operations involving data stored in computersystems. For example, these operations can require physical manipulationof physical quantities—usually, though not necessarily, these quantitiestake the form of electrical or magnetic signals, where they (orrepresentations of them) are capable of being stored, transferred,combined, compared, or otherwise manipulated. Such manipulations areoften referred to in terms such as producing, identifying, determining,comparing, etc. Any operations described herein that form part of one ormore embodiments can be useful machine operations.

Further, one or more embodiments can relate to a device or an apparatusfor performing the foregoing operations. The apparatus can be speciallyconstructed for specific required purposes, or it can be a generalpurpose computer system selectively activated or configured by programcode stored in the computer system. In particular, various generalpurpose machines may be used with computer programs written inaccordance with the teachings herein, or it may be more convenient toconstruct a more specialized apparatus to perform the requiredoperations. The various embodiments described herein can be practicedwith other computer system configurations including handheld devices,microprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers, and the like.

Yet further, one or more embodiments can be implemented as one or morecomputer programs or as one or more computer program modules embodied inone or more non-transitory computer readable storage media. The termnon-transitory computer readable storage medium refers to any datastorage device that can store data which can thereafter be input to acomputer system. The non-transitory computer readable media may be basedon any existing or subsequently developed technology for embodyingcomputer programs in a manner that enables them to be read by a computersystem. Examples of non-transitory computer readable media include ahard drive, network attached storage (NAS), read-only memory,random-access memory, flash-based nonvolatile memory (e.g., a flashmemory card or a solid state disk), a CD (Compact Disc) (e.g., CD-ROM,CD-R, CD-RW, etc.), a DVD (Digital Versatile Disc), a magnetic tape, andother optical and non-optical data storage devices. The non-transitorycomputer readable media can also be distributed over a network coupledcomputer system so that the computer readable code is stored andexecuted in a distributed fashion.

Finally, boundaries between various components, operations, and datastores are somewhat arbitrary, and particular operations are illustratedin the context of specific illustrative configurations. Otherallocations of functionality are envisioned and may fall within thescope of the invention(s). In general, structures and functionalitypresented as separate components in exemplary configurations can beimplemented as a combined structure or component. Similarly, structuresand functionality presented as a single component can be implemented asseparate components.

As used in the description herein and throughout the claims that follow,“a,” “an,” and “the” includes plural references unless the contextclearly dictates otherwise. Also, as used in the description herein andthroughout the claims that follow, the meaning of “in” includes “in” and“on” unless the context clearly dictates otherwise.

The above description illustrates various embodiments along withexamples of how aspects of particular embodiments may be implemented.These examples and embodiments should not be deemed to be the onlyembodiments, and are presented to illustrate the flexibility andadvantages of particular embodiments as defined by the following claims.Other arrangements, embodiments, implementations and equivalents can beemployed without departing from the scope hereof as defined by theclaims.

What is claimed is:
 1. A method for archiving data in cloud/objectstorage using local metadata staging, the method comprising: receiving,by a computer system residing at an on-premises site comprisingon-premises storage, a snapshot of a dataset to be archived, thesnapshot including data of the dataset; packaging, by the computersystem, the data included in the snapshot into one or more fixed-sizedata chunks and uploading the one or more fixed-size data chunks to thecloud/object storage; concurrently with the packaging and the uploading,staging, by the computer system, metadata for the snapshot in theon-premises storage; and upon uploading all of the one or morefixed-size data chunks, uploading, by the computer system, the metadatastaged in the on-premises storage to the cloud/object storage.
 2. Themethod of claim 1 wherein staging the metadata for the snapshot in theon-premises storage comprises: creating and updating, in the on-premisesstorage, a B+ tree for the snapshot comprising internal nodes and leafnodes, each leaf node representing a data block of the snapshot andincluding a pointer to a cloud physical block address (CPBA) in thecloud/object storage where the data block has been or will be uploaded.3. The method of claim 1 wherein uploading the metadata comprises:packaging the metadata into one or more fixed-size metadata chunks anduploading the one or more fixed-size metadata chunks to the cloud/objectstorage.
 4. The method of claim 1 wherein a predefined range of localphysical block addresses (LPBAs) is reserved for snapshot metadata forthe dataset in the on-premises storage, and wherein an identicalpredefined range of CPBAs is reserved for snapshot metadata for thedataset in the cloud/object storage.
 5. The method of claim 4 whereinstaging the metadata for the snapshot in the on-premises storagecomprises: allocating blocks for the metadata sequentially from thepredefined range of LPBAs.
 6. The method of claim 5 wherein uploadingthe metadata staged in the on-premises storage to the cloud/objectstorage comprises: packaging the metadata into one or more fixed-sizemetadata chunks having sequential chunk identifiers from the predefinedrange of CPBAs and uploading the one or more fixed-size metadata chunkswith the sequential chunk identifiers to the cloud/object storage,wherein the CPBA of each block of the metadata in the cloud/objectstorage maps directly to the LPBA of said each block of the metadata inthe on-premises storage.
 7. The method of claim 1 wherein staging themetadata for the snapshot in the on-premises storage comprises:allocating blocks for the metadata randomly from a local physical blockaddress space on the on-premises storage.
 8. A non-transitory computerreadable storage medium having stored thereon program code executable bya computer system residing at an on-premises site comprising on-premisesstorage, the program code embodying a method for archiving data incloud/object storage using local metadata staging, the methodcomprising: receiving a snapshot of a dataset to be archived, thesnapshot including data of the dataset; packaging the data included inthe snapshot into one or more fixed-size data chunks and uploading theone or more fixed-size data chunks to the cloud/object storage;concurrently with the packaging and the uploading, staging metadata forthe snapshot in the on-premises storage; and upon uploading all of thedata of the snapshot one or more fixed-size data chunks, uploading themetadata staged in the on-premises storage to the cloud/object storage.9. The non-transitory computer readable storage medium of claim 8wherein staging the metadata for the snapshot in the on-premises storagecomprises: creating and updating, in the on-premises storage, a B+treefor the snapshot comprising internal nodes and leaf nodes, each leafnode representing a data block of the snapshot and including a pointerto a cloud physical block address (CPBA) in the cloud/object storagewhere the data block has been or will be uploaded.
 10. Thenon-transitory computer readable storage medium of claim 8 whereinuploading the metadata comprises: packaging the metadata into one ormore fixed-size metadata chunks and uploading the one or more fixed-sizemetadata chunks to the cloud/object storage.
 11. The non-transitorycomputer readable storage medium of claim 8 wherein a predefined rangeof local physical block addresses (LPBAs) is reserved for snapshotmetadata for the dataset in the on-premises storage, and wherein anidentical predefined range of CPBAs is reserved for snapshot metadatafor the dataset in the cloud/object storage.
 12. The non-transitorycomputer readable storage medium of claim 11 wherein staging themetadata for the snapshot in the on-premises storage comprises:allocating blocks for the metadata sequentially from the predefinedrange of LPBAs.
 13. The non-transitory computer readable storage mediumof claim 12 wherein uploading the metadata staged in the on-premisesstorage to the cloud/object storage comprises: packaging the metadatainto one or more fixed-size metadata chunks having sequential chunkidentifiers from the predefined range of CPBAs and uploading the one ormore fixed-size metadata chunks with the sequential chunk identifiers tothe cloud/object storage, wherein the CPBA of each block of the metadatain the cloud/object storage maps directly to the LPBA of said each blockof the metadata in the on-premises storage.
 14. The non-transitorycomputer readable storage medium of claim 8 wherein staging the metadatafor the snapshot in the on-premises storage comprises: allocating blocksfor the metadata randomly from a local physical block address space onthe on-premises storage.
 15. A computer system residing at anon-premises site including on-premises storage, the computer systemcomprising: a processor; and a non-transitory computer readable mediumhaving stored thereon program code that, when executed, causes theprocessor to: receive a snapshot of a dataset to be archived, thesnapshot including data of the dataset; package the data included in thesnapshot into one or more fixed-size data chunks and uploading the oneor more fixed-size data chunks to cloud/object storage; concurrentlywith the packaging and the uploading, stage metadata for the snapshot inthe on-premises storage; and upon uploading all of the one or morefixed-size data chunks, upload the metadata staged in the on-premisesstorage to the cloud/object storage.
 16. The computer system of claim 15wherein staging the metadata for the snapshot in the on-premises storagecomprises: creating and updating, in the on-premises storage, a B+ treefor the snapshot comprising internal nodes and leaf nodes, each leafnode representing a data block of the snapshot and including a pointerto a cloud physical block address (CPBA) in the cloud/object storagewhere the data block has been or will be uploaded.
 17. The computersystem of claim 15 wherein uploading the metadata comprises: packagingthe metadata into one or more fixed-size metadata chunks and uploadingthe one or more fixed-size metadata chunks to the cloud/object storage.18. The computer system of claim 15 wherein a predefined range of localphysical block addresses (LPBAs) is reserved for snapshot metadata forthe dataset in the on-premises storage, and wherein an identicalpredefined range of CPBAs is reserved for snapshot metadata for thedataset in the cloud/object storage.
 19. The computer system of claim 18wherein staging the metadata for the snapshot in the on-premises storagecomprises: allocating blocks for the metadata sequentially from thepredefined range of LPBAs.
 20. The computer system of claim 19 whereinuploading the metadata staged in the on-premises storage to thecloud/object storage comprises: packaging the metadata into one or morefixed-size metadata chunks having sequential chunk identifiers from thepredefined range of CPBAs and uploading the one or more fixed-sizemetadata chunks with the sequential chunk identifiers to thecloud/object storage, wherein the CPBA of each block of the metadata inthe cloud/object storage maps directly to the LPBA of said each block ofthe metadata in the on-premises storage.
 21. The computer system ofclaim 15 wherein staging the metadata for the snapshot in theon-premises storage comprises: allocating blocks for the metadatarandomly from a local physical block address space on the on-premisesstorage.